PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Data Subset Selection
نویسندگان
چکیده
With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the achieve certain desiderata, which includes focusing or targeting data points, while avoiding others. Examples such problems include: i)targeted learning, where goal find subsets with rare classes attributes on model under performing, and ii)guided summarization, (e.g., image collection, text, document video) summarized quicker human consumption specific additional user intent. Motivated by applications, we present PRISM, rich class PaRameterIzed Submodular information Measures. Through novel functions their parameterizations, PRISM offers variety modeling capabilities that enable trade-off between desired qualities like diversity representation similarity/dissimilarity set points. We demonstrate how can be applied two real-world mentioned above, require guided selection. In doing so, show interestingly generalizes some past work, therein reinforcing its broad utility. extensive experiments diverse datasets, superiority over state-of-the-art in targeted learning image-collection summarization. available as part SUBMODLIB (https://github.com/decile-team/submodlib) TRUST (https://github.com/decile-team/trust) toolkits.
منابع مشابه
Unsupervised Submodular Subset Selection for Speech Data :extended Version
We conduct a comparative study on selecting subsets of acoustic data for training phone recognizers. The data selection problem is approached as a constrained submodular optimization problem. Previous applications of this approach required transcriptions or acoustic models trained in a supervised way. In this paper we develop and evaluate a novel and entirely unsupervised approach, and apply it...
متن کاملCausal meets Submodular: Subset Selection with Directed Information
We study causal subset selection with Directed Information as the measure of prediction causality. Two typical tasks, causal sensor placement and covariate selection, are correspondingly formulated into cardinality constrained directed information maximizations. To attack the NP-hard problems, we show that the first problem is submodular while not necessarily monotonic. And the second one is “n...
متن کاملHow to select a good training-data subset for transcription: submodular active selection for sequences
Given a large un-transcribed corpus of speech utterances, we address the problem of how to select a good subset for wordlevel transcription under a given fixed transcription budget. We employ submodular active selection on a Fisher-kernel based graph over un-transcribed utterances. The selection is theoretically guaranteed to be near-optimal. Moreover, our approach is able to bootstrap without ...
متن کاملCausal Markov Condition for Submodular Information Measures
The causal Markov condition (CMC) is a postulate that links observations to causality. It describes the conditional independences among the observations that are entailed by a causal hypothesis in terms of a directed acyclic graph. In the conventional setting, the observations are random variables and the independence is a statistical one, i.e., the information content of observations is measur...
متن کاملA New Hybrid Feature Subset Selection Algorithm for the Analysis of Ovarian Cancer Data Using Laser Mass Spectrum
Introduction: Amajor problem in the treatment of cancer is the lack of an appropriate method for the early diagnosis of the disease. The chemical reaction within an organ may be reflected in the form of proteomic patterns in the serum, sputum, or urine. Laser mass spectrometry is a valuable tool for extracting the proteomic patterns from biological samples. A major challenge in extracting such ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2022
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v36i9.21264